Add sanitization and validation for files #3

amirRamirfatahi · 2024-12-04T06:33:35Z

No description provided.

tipogi

what about that comment? I cannot see anything related, is there any reason?

SHAcollision · 2024-12-11T17:14:55Z

src/file.rs

+        }
+
+        // Validate size
+        if self.size <= 0 || self.size > MAX_SIZE {


Are we taking the client declared size in /pubky.app/file/<object> at face value? Is there any way we can actually check the size of the blob? Maybe we should have a blob model with the validation that should be passed after we get the blob response in ingest() and before store_blob() in events/handlers/file.rs

I don't think the user has any incentive to lie about the size of the file.
Having said that, it looks close to the content-type issue. Our best check would be to make sure it matches the content-length header returned from the homeserver.

Is this something we can write into this specs crate? Should be as part of the /blob validation that probably has to run as a child validation of the /file one ?

SHAcollision · 2024-12-11T17:17:09Z

src/file.rs

+
+        Self {
+            name,
+            created_at: self.created_at,


I don't understand why we cannot use hash_id. We have a created_at timestamp anyway, therefore it is duplicated information. It's a straightforward method for the user to post 3 times the same .gif without creating 3 times the file on his homeserver or the indexer.

Alternatively, we have name field, that could as well be the file and blob name instead.

What is the emphasis on hash_id for? I understand its use case for something like tags, but we don't need that here.
The use case you're mentioning is not solved by having the same hash_id. With the current way, a sane user can create one blob, one file and then use that one file anywhere he wants. Should we stop the ability of someone to create multiple blobs and multiple files? I don't think so. I think it's important to remember we're a proxy here and people can do whatever they want with their homeservers, so while we enable, we shouldn't really look into ways of prohibiting some use cases, no matter how fringe they might sound to us, when there's no problem with having them.

Sorry, not necessarily saying hash_id should go on the /file object only, I am talking about the obvious big benefits of hash_id for the data blob. Are the /blobs and the way Nexus uses them covered by the specs?

There is no use for the timestamp ids.

Consider what happens when in pubky-app I am a shit poster and I reply to 10 users by dropping into the post editor modal the same animated gif to make fun of them.

people can do whatever they want with their homeservers

This is barely an argument as the specs are created with the explicit intention of restricting the way data is written into homeservers into a common set of rules that allow interoperability between social pubky-app clients and indexers. A user can write whatever data he wants in whatever spec breaking way he wants and simply share a URI.

when there's no problem with having them

I think free storage saving for anyone using pubky-app according to specs is really good for such a small change (just hash_id for blobs). Might seem a stupid optimization this early but changing IDs and schemas post-launch will be harder.

For blob ids it does make sense to have hash_ids.

SHAcollision · 2024-12-11T17:19:49Z

src/file.rs

+        // validate content type
+        match Mime::from_str(&self.content_type) {
+            Ok(mime) => {
+                if !VALID_MIME_TYPES.contains(&mime.essence_str()) {


Declared mime type could be pdf and blob be a exe , right? On click on the browser from pubky-app, the user will download a .exe ?

yes. The user doesn't really have an incentive to lie about this.
Ultimately of course, we can have some security checker that runs the files through some malware detector, but mind you, something like this needs to be implemented on the homeserver first. The same goes for content type and sniffing the actual file type of the blob.

The user doesn't really have an incentive to lie about this.

Can you elaborate?

I'm saying for normal users there's no need to lie about the content type of their files.
For malicious intent, you don't have to lie about the content type. IMO, If there are to be checks for this, it should be on the homeserver, not here.

SHAcollision · 2024-12-11T17:27:44Z

src/file.rs

+const MIN_NAME_LENGTH: usize = 1;
+const MAX_NAME_LENGTH: usize = 255;
+const MAX_SRC_LENGTH: usize = 1024;
+const MAX_SIZE: i64 = 10_000_000; // 10 MB


Are we using decimal MB?

To avoid confusion maybe best to do 2^20 B

const MAX_SIZE: i64 = 10 * (1 << 20);

or 10_485_760

Add sanitization and validation for files

5903a04

amirRamirfatahi requested a review from tipogi December 4, 2024 06:33

tipogi approved these changes Dec 4, 2024

View reviewed changes

tipogi linked an issue Dec 4, 2024 that may be closed by this pull request

Feat: files validation, sanitization, docs. #2

Open

tipogi self-requested a review December 4, 2024 08:25

fix mime type not actually validating

dac2b73

SHAcollision reviewed Dec 11, 2024

View reviewed changes

amirRamirfatahi added 3 commits December 18, 2024 03:21

fix file max size

dc84075

Add PubkyAppBlob

5e3e2d2

Add file blob type

f485401

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sanitization and validation for files #3

Add sanitization and validation for files #3

amirRamirfatahi commented Dec 4, 2024

tipogi left a comment •

edited

Loading

SHAcollision Dec 11, 2024 •

edited

Loading

amirRamirfatahi Dec 18, 2024

SHAcollision Dec 19, 2024

SHAcollision Dec 11, 2024

amirRamirfatahi Dec 18, 2024

SHAcollision Dec 19, 2024 •

edited

Loading

amirRamirfatahi Dec 20, 2024

SHAcollision Dec 11, 2024

amirRamirfatahi Dec 18, 2024

SHAcollision Dec 19, 2024

amirRamirfatahi Dec 20, 2024

SHAcollision Dec 11, 2024

Add sanitization and validation for files #3

Are you sure you want to change the base?

Add sanitization and validation for files #3

Conversation

amirRamirfatahi commented Dec 4, 2024

tipogi left a comment • edited Loading

Choose a reason for hiding this comment

SHAcollision Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SHAcollision Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tipogi left a comment •

edited

Loading

SHAcollision Dec 11, 2024 •

edited

Loading

SHAcollision Dec 19, 2024 •

edited

Loading